Laboratory 2

Microstructures and Trading Systems



Moises Flores Ortiz, BS Student, if722183@iteso.mx

06.2022 | Repository: Link


APT and Roll Pricing Models

Implementation of Asset Pricing Theory Model and the Roll Model: Microstructure pricing behavior



Abstract

In this project, the calculations of the APT model and the Roll model are presented, as well as the graphs that can explain the behavior of the bid, ask and prices in general. In terms of programming, the classes are used with their methods and attributes generated in laboratory 1 under the approach of the order book seen as an object. However, two classes are added, one that contains the models and another that contains the methods for the visualizations.

After carefully analyzing each model, it is concluded that the models work to explain the behavior of prices under their respective assumptions, since it can be visualized in the data and graphically.


1. Introduction


This project was developed with the intention of exploring the behavior of prices in the microstructure of the market. The two models to be analyzed are the Asset Pricing Theory model and the Roll model. The first is about the verification of a stochastic martingale process in prices and the second is about the calculation of the theoretical bid and ask through the calculation of the equally theoretical spread.

A JSON file extracted from bitfinex containing a collection of 1-hour order books is scanned. The order book is treated as an object which can have characteristics that are translated into methods and attributes in code, that is, the OOP paradigm is used. For this reason, the classes generated in laboratory 1 are used, as well as a new class that contains the models and another for the corresponding graphs.

Throughout the document, the calculations for the APT model are presented, which show a partial operation of a martingale process with their respective reasons, and graphics are implemented to visualize it. And finally the bid, ask and spread calculated with the Roll model against the current bid, ask and spread are presented, as well as graphs of their comparison.


2. Install/Load Packages and Depedencies


In addition to the required packages that are mentioned in section 2.1, it is important to mention that from the files that make up the project, the classes that allow transforming the data must be imported, call the measures of the orderbook from where the prices are obtained; call models and display methods with their corresponding class.

2.1 Python Packages

In order to run this notebook, it is necessary to have installed and/or have the requirements.txt file with the following:

In [1]:
# %%capture

# Install all the pip packages in the requirements.txt
# import sys
# !{sys.executable} -m pip install -r requirements.txt
  • pandas>=1.1.1
  • numpy>=1.19.1
  • jupyter>=1.0.0
  • chart_studio>=1.1
  • plotly>=4.14
In [2]:
import pandas as pd
import numpy as np
from functions import PublicTradesMeasures, OrderBookMeasures, PricingModelsOB
from data import DataPreparation
from visualizations import PlotsModelsOB

2.2 Files Dependencies

In this case, a collection of order books is used, extracted as a JSON object from bitfinex, in which the prices, volumes and timestamps of each book are contained.

The following are the file dependencies that are needed to run this notebook:

  • files/orderbooks_05jul21.json: Orderbooks

3. Data Preparation


For the preparation of the data it is necessary to instantiate the class DataPreparation that contains the method that obtains the raw data from the JSON file and converts it to a dictionary of dataframes for each order book in which the prices are contained, also bid and ask volumes.

In [3]:
data = DataPreparation()

order_books_data = data.order_books_json_transformation('files/orderbooks_05jul21.json')


4. Data Description


As discussed in the previous section, each order book is a dataframe that contains its timestamp, bid, ask and the corresponding volumes. It can be noted that the orderbook has the logic of bid and ask accommodation, since the top of the book presents the highest bid and the lowest ask.

In [4]:
display(order_books_data['2021-07-05T13:06:46.571Z'].head(),order_books_data['2021-07-05T13:06:46.571Z'].tail()) 
bid_size bid ask ask_size
0 0.000400 28270.0 28275.0 0.025405
1 0.009787 28269.0 28276.0 0.516810
2 0.008168 28268.0 28277.0 0.005044
3 0.995787 28266.0 28278.0 0.377374
4 1.038704 28265.0 28280.0 1.179715
bid_size bid ask ask_size
20 2.809104 28244.0 28296.0 3.424052
21 0.756619 28243.0 28297.0 0.005064
22 0.697787 28242.0 28298.0 1.192474
23 0.377316 28241.0 28299.0 0.847424
24 0.010000 28240.0 28300.0 0.802599

It is important to emphasize that for this project the top of the book of each order book is used, since they represent the closest bid and ask to a transaction price or the price level at which the market is appreciating the asset, since by definition the bid is the most you are willing to pay for an asset and the ask is the least you are willing to sell the asset. With these bid and ask, a general order book is calculated to which the mid price or weighted mid price methods can be applied, which will be used for the models.



Next, a description of the mid price of the order book from the top of the book is made and some measurements are shown: </font>

In [5]:
help(OrderBookMeasures.mid_price)
Help on function mid_price in module functions:

mid_price(self) -> pandas.core.frame.DataFrame
    This method caculates the mid price of top of the orderbook.
    
    Parameters
    ----------
    Initialized on instance:
        data_ob: orderbook data.
        ob_ts: list of timestamps of orderbooks.
    
    Returns
    ------
    Mid price: DataFrame

In [6]:
OrderBookMeasures(order_books_data).mid_price().describe()
Out[6]:
mid_price
count 2401.000000
mean 28351.878592
std 42.215634
min 28270.000000
25% 28315.500000
50% 28349.500000
75% 28384.500000
max 28444.500000


5. Model and Plot Classes Instance


In this section, the instances of the model classes and visualizations are created in order to access the corresponding methods that show the calculations and graphs of each model. The architecture can be consulted in the laboratory 2 repository. Both classes were developed with the aim of giving unity to the models and displays of the orderbook since it is seen as an object, and it also allowed for a more ordered code.

In [7]:
model = PricingModelsOB(order_books_data)
In [8]:
plot = PlotsModelsOB()


6. Results


In the results section, the objects initialized in the previous sections are used to call the methods that calculate and present the results of the models, however, in each subsection the purpose of each model and the pertinent objective are described at a high level. For this lab, the results are shown in dataframes and graphs while the results obtained are described.

6.1 APT Model

The APT model is based on the concept of value, in the time $t$ of a payment $x_{t+1}$, which for the case of a share is: $$x_{t+1} = p_{t+1} + d_{t+1}$$
where:

  • $x_{t+1}$: Payment $x$ at time $t$, to be made at time $t+1$
  • $p_{t+1}$: Future share price
  • $d_{t+1}$: Dividend paid per share

Subsequently, if the economic model of utility function $U(c_{t}, c_{t+1}) = U(c_{t}) + \beta U(c_{t+1})$ is applied, where:

  • $c_{t}$: consumption in the present time
  • $c_{t+1}$: consumption in the future time
  • $\beta$: subjective discount factor

However, as the utility in the future time is uncertain, the best approximation is through the expected value, in addition, it is intended to maximize the utility function: $$\frac{max}{\{k\}} = U(c_{t}) + E[\beta U(c_{t+1})] $$

After performing the optimization and applying the expected value, you have to: $$P_{t} = E[m_{t+1}X_{t+1}]$$ Where:

  • $P_{t}$: Price of the asset at time $t+1$
  • $m_{t+1}$: Stochastic discount factor
  • $X_{t+1}$: payment $x$ in time $t$, to be made in time $t+1$

For the case of a share it is obtained that: $$X_{t+1} = P_{t+1} + d_{t+1}$$ However, in the microstructure of the market, the following actions can be applied:

  • $d_{t+1} = 0$: In a very short term the stock does not receive dividends
  • $\beta \approx 1$: No preference for long time intervals
  • $U^{'}(C_{t+1}) = U^{'}(C_{t})$: Does not affect market uncertainty

Therefore, the effect on the optimization causes that:

$$P_{t}=E\big[\beta\frac{U^{'}(C_{t+1})}{U^{'}(C_{t})}X_{t+1} \big] \approx E[(1)(1)P_{t+1}]$$

From the perspective of the market, a model of present and future time consumption can be modeled as a martingale-type stochastic process, that is: </font> $$P_{t} = E[P_{t+1}] = E[P_{t+1}] = P_t$$


Due to the above, the mid price or a weighting of the mid price can be used to be able to test the model and see if martingale process is fulfilled, considering that the time intervals for the arrival of the next price are not precisely continuous yet in the microstructure, which differs from the theoretical model. However, the timestamp must be taken as the natural time, which is the one that fits the model. The results of the calculations for the mid price and the weighted mid price are shown below.

6.1.1 APT Model: Mid Price

From what is described in the theory of the APT model, it is known that it is possible to use the mid price to verify whether the price follows a martingale process.


It is important to define that mid price is calculated as:

$$P_{m}= \sum_{i=1}^{n}(P_{bi} * P_{ai})*0.5$$

Where,

  • $P_{m}$ = Mid price
  • $P_{bi}$ = Bid price
  • $P_{ai}$ = Ask price


What can be observed is that most of the prices within the same time bucket, that is, minutes, do follow a martingale process $e_{1}$, however, there is a quantity that does not comply with this $e_{2}$, so it could be said that the model has a partial application to these data. It is important to assert that the time frequency is relatively high, so we could theorize that for continuously generated prices, the prices could follow a martingale process.

In [9]:
help(model.apt_model)
Help on method apt_model in module functions:

apt_model(price_type: str, by: str = '1T') -> pandas.core.frame.DataFrame method of functions.PricingModelsOB instance
    This method applies the apt model and delivers the count of 
    how many prices comply with a martingala process by the user
    desired time. It can be with 'mid_price¿ or 'weighted_midprice'.
    
    Parameters 
    ----------
    Initialized on instance:
        data_ob: orderbook data.
        ob_ts: list of timestamps of orderbooks.
    
    Required on calling:
        price_type: str. 'mid_price' or 'weighted_midprice'.
        by: str. Groupying time desired. e.g. '1T' (default) resample by hour.
    Returns
    ------
    Martingala count and percentages: DataFrame

In [10]:
display(model.apt_model('mid_price', by='1T').head(), model.apt_model('mid_price', by='1T').tail())
interval total e1_count e1_percentage e2_count e2_percentage
2021-07-05 13:06:00+00:00 0 8 6 0.750000 2 0.250000
2021-07-05 13:07:00+00:00 1 40 27 0.675000 13 0.325000
2021-07-05 13:08:00+00:00 2 39 31 0.794872 8 0.205128
2021-07-05 13:09:00+00:00 3 38 27 0.710526 11 0.289474
2021-07-05 13:10:00+00:00 4 40 30 0.750000 10 0.250000
interval total e1_count e1_percentage e2_count e2_percentage
2021-07-05 14:02:00+00:00 56 38 30 0.789474 8 0.210526
2021-07-05 14:03:00+00:00 57 40 33 0.825000 7 0.175000
2021-07-05 14:04:00+00:00 58 39 27 0.692308 12 0.307692
2021-07-05 14:05:00+00:00 59 38 26 0.684211 12 0.315789
2021-07-05 14:06:00+00:00 60 31 25 0.806452 6 0.193548

6.1.2 APT Model: Weighted Mid Price

The replication of the model for the weighted mid price does not present any variation with respect to the results described in the previous section, which means that it is not important to include the bid and ask volumes to determine if it is a martingale process or not.


Let's recall that the Wighted Mid Price is calculated through the next formula:

$$WP_{m}= \sum_{i=1}^{n}\bigg(\frac{V_{ai}}{V_{ai}+V_{bi}}*P_{bi} + \frac{V_{bi}}{V_{ai}+V_{bi}}*P_{ai}\bigg)$$

Where,

  • $WP_{m}$ = Weighted Mid price
  • $P_{bi}$ = Bid price
  • $P_{ai}$ = Ask price
  • $V_{bi}$ = Bid volume
  • $V_{ai}$ = Ask volume

</font>

In [11]:
display(model.apt_model('weighted_midprice', by='1T').head(), model.apt_model('weighted_midprice', by='1T').tail())
interval total e1_count e1_percentage e2_count e2_percentage
2021-07-05 13:06:00+00:00 0 8 6 0.750000 2 0.250000
2021-07-05 13:07:00+00:00 1 40 27 0.675000 13 0.325000
2021-07-05 13:08:00+00:00 2 39 26 0.666667 13 0.333333
2021-07-05 13:09:00+00:00 3 38 26 0.684211 12 0.315789
2021-07-05 13:10:00+00:00 4 40 27 0.675000 13 0.325000
interval total e1_count e1_percentage e2_count e2_percentage
2021-07-05 14:02:00+00:00 56 38 28 0.736842 10 0.263158
2021-07-05 14:03:00+00:00 57 40 27 0.675000 13 0.325000
2021-07-05 14:04:00+00:00 58 39 26 0.666667 13 0.333333
2021-07-05 14:05:00+00:00 59 38 26 0.684211 12 0.315789
2021-07-05 14:06:00+00:00 60 31 21 0.677419 10 0.322581

6.2 APT Model Plots

This section shows the results obtained graphically, both the application of the model for mid price and weighted mid price. The graphs were shown for both the absolute count and the percentages or proportions of each of the results.

6.2.1 Plots of the count of martingala proccess

In [12]:
help(plot.plot_apt_model_count)
Help on function plot_apt_model_count in module visualizations:

plot_apt_model_count(data: pandas.core.frame.DataFrame, price_type: str) -> plotly.graph_objs._figure.Figure
    This method plots the calculation of count of the apt model.
    
    Parameters 
    ----------
    Required on calling:
        data: DataFrame. e.g. 'mid_price' or 'weighted_midprice'.
             It should contain columns 'e1_count' and 'e2_count'
        price_type: str. The name of data to plot.
    Returns
    ------
    Plot e1 and e2 count: Figure

In [13]:
display(plot.plot_apt_model_count(model.apt_model('mid_price', by='1T'), 'Mid Price'),
       plot.plot_apt_model_count(model.apt_model('weighted_midprice', by='1T'), 'Weighted Mid Price'))
None
None

6.2.2 Plots of the percentage of martingala proccess

In [14]:
help(plot.plot_apt_model_percentage)
Help on function plot_apt_model_percentage in module visualizations:

plot_apt_model_percentage(data: pandas.core.frame.DataFrame, price_type: str) -> plotly.graph_objs._figure.Figure
    This method plots the calculation of percentage of the apt model.
    
    Parameters 
    ----------
    Required on calling:
        data: DataFrame. e.g. 'mid_price' or 'weighted_midprice'.
             It should contain columns 'e1_percentage' and 'e2_percentage'
        price_type: str. The name of data to plot.
    Returns
    ------
    Plot e1 and e2 percentage: Figure

In [15]:
display(plot.plot_apt_model_percentage(model.apt_model('mid_price', by='1T'), 'Mid Price'),
       plot.plot_apt_model_percentage(model.apt_model('weighted_midprice', by='1T'), 'Weighted Mid Price'))
None
None

6.3 The Roll Model

The Roll model seeks to calculate the spread through the prices $p_{t}$ and with which the theoretical bid and ask can be determined.

Firstly, the model proposes two equations, one that models the value of the financial asset and the other the transaction price: $$m_{t} = m_{t-1} + u_{t}$$ $$p_{t} = m_{t}+Cq_{t}$$

where:

  • $m_{t}$:Efficient price in time $t$
  • $m_{t-1}$: Efficient price in time $t-1$
  • $p_{t}$: Transaction price at time $t$
  • $C$; Transaction cost imposed by dealers
  • $u_{t}$: Random component with distribution $N(0,\sigma^{2})$
  • $q_{t}$: Binary variable (buy +1) and (sell-1)

Furthermore, the model needs $\Delta P_{t}$ and $\Delta P_{t-1}$: $$\Delta P_{t} = C_{qt}-C_{qt-1} + u_{t}$$ $$\Delta P_{t-1} = -C_{qt-2}+C_{qt-1} - u_{t-1}$$

However it is important to consider that:

  • $E[q_{t-1}q_{t}] = 0$: There is no serial correlation
  • $E[u_{t}]=0$: because $u_{t}$ is distributed as $N(0,\sigma^{2})$
  • $E[u_{t}^{2}] = \sigma_{u}^{2}$

So the calculation of $Var(\Delta)$ and $Cov(\Delta P_{t-1},\Delta P_{t})$ are:

$$Var(\Delta)=2C^{2}+\sigma_{u}^{2}$$$$Cov(\Delta P_{t-1},\Delta P_{t})=-C^{2}$$

</font>


To calculate the spread, the constant $C$ was obtained through the equality $\gamma_{1} = Cov(\Delta p_{t-1}, \Delta p_{t}) = -C^{2}$ where (specifically for the data you have, that is, the mid price of the top of the book):

  • $\gamma_{1} = Cov(\Delta p_{t-1}, \Delta p_{t})$
  • $\Delta p_{t-1}$ is the difference with respect to the prices of a trace of the mid price
  • $\Delta p_t$ is the difference between the prices of the mid price.
  • $C$ is the constant (commission).

to calculate $C$ you can isolate $\gamma_{1}$:

$$\gamma_{1} = -C^{2}$$$$C = \sqrt{-\gamma_{1}}$$


Once $C$ has been calculated, it is then possible to calculate the spread that according to the model is defined as $at - bt = 2C$ and therefore the theoretical bid and ask. $$\bar{bid} = P_{t}- C$$
$$\bar{ask} = P_{t} + C$$


After making the application to the orderbook data, it is possible to notice that the theoretical spread does not represent a great magnitude, the same happens with $C$, so after calculating the theoretical bid and ask, we do not really notice a substantial change in the behavior of the current bid. and ask. In the case of the model spread, it is a constant spread which differs from the behavior of the current spread. All of the above can be seen in the dataframe and graphs of the following sections.

6.3.1 Calculation of Bid and Ask with the Mid Price

In the following dataframe it is possible to observe the comparisons between the current prices and the prices calculated with the Roll model. Note the behavior described in the previous section. The most important thing to note is that, on the one hand, the magnitudes do not represent a significant change in the calculated bid and ask and that the calculated spread $2C$, unlike the current spread, is a constant.

In [16]:
help(model.roll_model)
Help on method roll_model in module functions:

roll_model() -> pandas.core.frame.DataFrame method of functions.PricingModelsOB instance
    This method applies the Roll model and delivers the a 
    dataframe that contains the actual bid-ask as well as
    theoretical bid-ask calculated with the model.
    
    Parameters 
    ----------
    Initialized on instance:
        data_ob: orderbook data.
        ob_ts: list of timestamps of orderbooks.
    
    Required on calling:
        No required.
    
    Returns
    ------
    bid and ask theorical prices: DataFrame

In [17]:
display(model.roll_model().head(),model.roll_model().tail())
mid_price bid bid_calculated bid_delta ask ask_calculated ask_delta c spread spread_calculated delta_spred
2021-07-05 13:06:46.571000+00:00 28272.5 28270.0 28272.464978 -2.464978 28275.0 28272.535022 2.464978 0.035022 5.0 0.070044 4.929956
2021-07-05 13:06:47.918000+00:00 28272.5 28270.0 28272.464978 -2.464978 28275.0 28272.535022 2.464978 0.035022 5.0 0.070044 4.929956
2021-07-05 13:06:49.414000+00:00 28272.5 28270.0 28272.464978 -2.464978 28275.0 28272.535022 2.464978 0.035022 5.0 0.070044 4.929956
2021-07-05 13:06:51.077000+00:00 28276.5 28275.0 28276.464978 -1.464978 28278.0 28276.535022 1.464978 0.035022 3.0 0.070044 2.929956
2021-07-05 13:06:52.426000+00:00 28276.5 28275.0 28276.464978 -1.464978 28278.0 28276.535022 1.464978 0.035022 3.0 0.070044 2.929956
mid_price bid bid_calculated bid_delta ask ask_calculated ask_delta c spread spread_calculated delta_spred
2021-07-05 14:06:40.583000+00:00 28358.5 28355.0 28358.464978 -3.464978 28362.0 28358.535022 3.464978 0.035022 7.0 0.070044 6.929956
2021-07-05 14:06:41.919000+00:00 28358.5 28355.0 28358.464978 -3.464978 28362.0 28358.535022 3.464978 0.035022 7.0 0.070044 6.929956
2021-07-05 14:06:43.416000+00:00 28358.5 28355.0 28358.464978 -3.464978 28362.0 28358.535022 3.464978 0.035022 7.0 0.070044 6.929956
2021-07-05 14:06:45.070000+00:00 28356.5 28354.0 28356.464978 -2.464978 28359.0 28356.535022 2.464978 0.035022 5.0 0.070044 4.929956
2021-07-05 14:06:46.412000+00:00 28356.5 28354.0 28356.464978 -2.464978 28359.0 28356.535022 2.464978 0.035022 5.0 0.070044 4.929956

6.4 The Roll Model Plots

In this section it is possible to graphically observe the behavior of the bid, ask and spread calculated with the Roll model as opposed to the actual ones. From the Roll model from the graphs we can conclude that the calculation of the bid and ask is quite close with the explanation that the magnitude of $C$ is not very significant, which has no impact on the behavior of the midprice, which in the end is a average between bid and ask.

6.4.1 Plots of the bid and ask calculation with the Roll Model

In [18]:
help(plot.plot_roll_model)
Help on function plot_roll_model in module visualizations:

plot_roll_model(data: pandas.core.frame.DataFrame, serie_type: str = 'calculated') -> plotly.graph_objs._figure.Figure
    This method plots the calculation of bid and ask  with Roll model. In case 
    a comparison against the actuals, it is possible to plot them.
    
    Parameters 
    ----------
    Required on calling:
        data: DataFrame with columns ['bid_calculated','bid','ask_calculated','ask'].
        type: str 'calculated' or 'actual'. Default 'calculated'.
    Returns
    ------
    Plots of all columns as subplots: Figure

In [19]:
plot.plot_roll_model(model.roll_model())
In [20]:
plot.plot_roll_model(model.roll_model(), serie_type='actual')

6.4.2 Plots of the Spread and Spread calculated with the Roll Model

In [21]:
help(plot.plot_spread_comparison_series)
Help on function plot_spread_comparison_series in module visualizations:

plot_spread_comparison_series(data: pandas.core.frame.DataFrame) -> plotly.graph_objs._figure.Figure
    This method plots the calculated spread and the actual spread.
    
    Parameters 
    ----------
    Required on calling:
        data: DataFrame with columns ['spread_calculated',''spread'].
    Returns
    ------
    Plots of comparison b/w spreads as series: Figure

In [22]:
plot.plot_spread_comparison_series(model.roll_model()) 
In [23]:
help(plot.plot_spread_comparison_bars)
Help on function plot_spread_comparison_bars in module visualizations:

plot_spread_comparison_bars(data: pandas.core.frame.DataFrame, by: str = '2T') -> plotly.graph_objs._figure.Figure
    This method plots the calculated spread and the actual spread.
    
    Parameters 
    ----------
    Required on calling:
        data: DataFrame with columns ['spread_calculated',''spread'].
    Returns
    ------
    Plots of comparison b/w spreads in bars: Figure

In [24]:
plot.plot_spread_comparison_bars(model.roll_model()) 

7. Conclusion

The models can apply to the data and comply with those dictated under certain assumptions that are difficult to apply to real data: the APT model would apply to the order book as long as in a few milliseconds or even in nanoseconds new bids were being added and ask, since it was observed that with a relatively low frequency the martingale process is fulfilled in most of the cases studied by bucket of time.


While in the Roll model it was possible to notice that the calculated spread allows the calculated bid and ask to have similarity, with the important difference that the calculated spread is a constant while the actual or observed spread has a variation. Finally, both models allow to a certain extent to explain the behavior of prices under mathematical theoretical concepts that may well apply under the correct conditions. This may allow the foundation of more advanced price prediction or decision-making models. </font>


8. References


  1. Bid Ask Spread - Roll Math. (2019, July 19). [Video]. YouTube. https://www.youtube.com/watch?v=-EvbFYUSAS0

  2. Hasbrouck, J. (2007). Empirical Market Microstructure: The Institutions, Economics, and Econometrics of Securities Trading (Illustrated ed.) [E-book]. Oxford University Press.

  3. Muñoz-Elguezabal, J. F. (2017, July 17). Asset Pricing Theory. Canvas. Retrieved June 18, 2022, from https://iteso.instructure.com/courses/27849/files/3808697?module_item_id=895296